Optimization of Cross-Lingual Voice Conversion With Linguistics Losses to Reduce Foreign Accents

نویسندگان

چکیده

Cross-lingual voice conversion (XVC) transforms the speaker identity of a source to that target who speaks different language. Due intrinsic differences between languages, converted speech may carry an unwanted foreign accent. In this paper, we first investigate intelligibility and confirm performance degradation caused by accent/intelligibility issue. With goal generating native-sounding speech, paper further proposes novel training scheme with two additional linguistic losses for waveform generation: 1) frame-wise phonetic content loss derived from bottleneck features, 2) automatic recognition on characters. Experiments were conducted English Mandarin Chinese conversions. The experimental results confirmed generated sounds more natural proposed solution significantly improves intelligibility.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Cross - Lingual Voice Conversion

CROSS-LINGUAL VOICE CONVERSION Cross-lingual voice conversion refers to the automatic transformation of a source speaker’s voice to a target speaker’s voice in a language that the target speaker can not speak. It involves a set of statistical analysis, pattern recognition, machine learning, and signal processing techniques. This study focuses on the problems related to cross-lingual voice conve...

متن کامل

Frame alignment method for cross-lingual voice conversion

Most of the existing voice conversion methods calculate the optimal transformation function from a given set of paired acoustic vectors of the source and target speakers. The alignment of the phonetically equivalent source and target frames is problematic when the training corpus available is not parallel, although this is the most realistic situation. The alignment task is even more difficult ...

متن کامل

Articulatory-based conversion of foreign accents with deep neural networks

We present an articulatory-based method for real-time accent conversion using deep neural networks (DNN). The approach consists of two steps. First, we train a DNN articulatory synthesizer for the non-native speaker that estimates acoustics from contextualized articulatory gestures. Then we drive the DNN with articulatory gestures from a reference native speaker –mapped to the nonnative articul...

متن کامل

Spectral Mapping Using Artificial Neural Networks for Intra-lingual and Cross-lingual Voice Conversion

CERTIFICATE This is to certify that the work contained in this thesis titled Spectral mapping using have not been submitted to any other Institute or University for the award of any degree or diploma. Date Mr. Kishore Prahallad ii ACKNOWLEDGEMENTS I would like to express my deepest appreciation to Kishore Prahallad, my advisor for his guidance, encouragement and support throughout my duration a...

متن کامل

Adding Glottal Source Information to Intra-Lingual Voice Conversion

This paper studies the inclusion of glottal source characteristics in voice conversion (VC) systems. We use source/filter decomposition to parametrize the vocal tract using LSF, the glottal source using the LF model, and the aspiration noise using amplitude-modulated high-pass filtered AWGN noise. To evaluate the impact of this new parametrization in VC, we use a reference conversion system tha...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE/ACM transactions on audio, speech, and language processing

سال: 2023

ISSN: ['2329-9304', '2329-9290']

DOI: https://doi.org/10.1109/taslp.2023.3271107